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METHOD AND DEVICE FOR NOISE REDUCTION 



CROSS-REFERENCE TO RELATED APPLICATIONS 

[0001] This application is a national stage application under 35 USC §37 1(c) of CT 
Application No. PCT/BE2004/000103, entitled "Method and Device for Noise Reduction," 
filed on July 12, 2004, which claims the priority of Australian Patent No. 2003903575, filed 
on July 11, 2003, and Australian Patent No. 2004901931, filed on April 8, 2004. The entire 
disclosure and contents of the above applications are hereby incorporated by reference herein. 

BACKGROUND 

Field of the Invention 

[0002] The present invention is related to a method and device for adaptively reducing 

the noise in speech communication applications. 

Related AH 

[0003] There are a variety of medical implants which deliver electrical stimulation to a 
patient or recipient ("recipient" herein) for a variety of therapeutic benefits. For example, the 
hair cells of the cochlea of a normal healthy ear convert acoustic signals into nerve impulses. 
People who are profoundly deaf due to the absence or destruction of cochlea hair cells are 
unable to derive suitable benefit from conventional hearing aid systems. Prosthetic hearing 
implant systems have been developed to provide such persons with the ability to perceive 
sound. Prosthetic hearing implant systems bypass the hair cells in the cochlea to directly 
deliver electrical stimulation to auditory nerve fibers, thereby allowing the brain to perceive a 
hearing sensation resembling the natural hearing sensation. 

[0004] The electrodes implemented in stimulating medical implants vary according to the 
device and tissue which is to be stimulated. For example, the cochlea is tonotopically 
mapped and partitioned into regions, with each region being responsive to stimulus signals in 
a particular frequency range. To accommodate this property of the cochlea, prosthetic 
hearing implant systems typically include an array of electrodes each constructed and 
arranged to deliver an appropriate stimulating signal to a particular region of the cochlea. 
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[0005] To achieve an optimal electrode position close to the inside wall of the cochlea, the 
electrode assembly should assume this desired position upon or immediately following 
implantation into the cochlea. It is also desirable that the electrode assembly be shaped such 
that the insertion process causes minimal trauma to the sensitive structures of the cochlea. 
Usually the electrode assembly is held in a straight configuration at least during the initial 
stages of the insertion procedure, conforming to the natural shape of the cochlear once 
implantation is complete. 

[0006] Prosthetic hearing implant systems typically have two primary components: an 
external component commonly referred to as a speech processor, and an implanted 
component commonly referred to as a receiver/stimulator unit. Traditionally, both of these 
components cooperate with each other to provide sound sensations to a recipient. 

[0007] The external component traditionally includes a microphone that detects sounds, 
such as speech and environmental sounds, a speech processor that selects and converts certain 
detected sounds, particularly speech, into a coded signal, a power source such as a battery, 
and an external transmitter antenna. 

[0008] The coded signal output by the speech processor is transmitted transcutaneously to 
the implanted receiver/stimulator unit, commonly located within a recess of the temporal 
bone of the recipient. This transcutaneous transmission occurs via the external transmitter 
antenna which is positioned to communicate with an implanted receiver antenna disposed 
within the receiver/stimulator unit. This communication transmits the coded sound signal 
while also providing power to the implanted receiver/stimulator unit. Conventionally, this 
link has been in the form of a radio frequency (RF) link, but other communication and power 
links have been proposed and implemented with varying degrees of success. 

[0009] The implanted receiver/stimulator unit traditionally includes the noted receiver 
antenna that receives the coded signal and power from the external component. The 
implanted unit also includes a stimulator that processes the coded signal and outputs an 
electrical stimulation signal to an intra-cochlea electrode assembly mounted to a carrier 
member. The electrode assembly typically has a plurality of electrodes that apply the 
electrical stimulation directly to the auditory nerve to produce a hearing sensation 
corresponding to the original detected sound. 
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SUMMARY 

[0010] In one aspect of the present invention, a method to reduce noise in a noisy speech 
signal is disclosed The method comprises applying at least two versions of the noisy speech 
signal to a first filter, whereby that first filter outputs a speech reference signal and at least 
one noise reference signal, applying a filtering operation to each of the at least one noise 
reference signals, and subtracting from the speech reference signal each of the filtered noise 
reference signals, wherein the filtering operation is performed with filters having filter 
coefficients determined by taking into account speech leakage contributions in the at least 
one noise reference signal. 

[0011] In another aspect of the invention to a signal processing circuit for reducing noise 
in a noisy speech signal, is enclosed. This signal processing circuit comprises a first filter 
having at least two inputs and arranged for outputting a speech reference signal and at least 
one noise reference signal, a filter to apply the speech reference signal to and filters to apply 
each of the at least one noise reference signals to, and summation means for subtracting from 
the speech reference signal the filtered speech reference signal and each of the filtered noise 
reference signals. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0012] Fig. 1 represents the concept of the Generalised Sidelobe Canceller in accordance 
with one embodiment of the present invention. 

[0013] Fig. 2 represents an equivalent approach of multi-channel Wiener filtering in 
accordance with one embodiment of the present invention. 

[0014] Fig. 3 represents a Spatially Pre-processed SDW-MWF in accordance with one 
embodiment of the present invention. 

[0015] Fig. 4 represents the decomposition of SP-SDW-MWF with n>o in a multi-channel 
filter Wd and single-channel postfilter ei-wo in accordance with one embodiment of the present 
invention. 

[0016] Fig. 5 represents the set-up for the experiments in accordance with one 
embodiment of the present invention. 
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[0017] Fig. 6 represents the influence of l/ju on the performance of the SDR GSC for 
different gain mismatches Y 2 at the second microphone in accordance with one embodiment 
of the present invention. 

[0018] Fig. 7 represents the influence of l/ju on the performance of the SP-SDW-MWF 
with wo for different gain mismatches Y 2 at the second microphone in accordance with one 
embodiment of the present invention. 

[0019] Fig. 8 represents the ASNRmteiug and SD^ius for QIC-GSC as a function of ft for 
different gain mismatches Y 2 at the second microphone in accordance with one embodiment 
of the present invention. 

[0020] Fig. 9 represents the complexity of TD and FD Stochastic Gradient (SG) algorithm 
with LP filter as a function of filter length L per channel; M=3 (for comparison, the 
complexity of the standard NLMS ANC and SPA are depicted too) in accordance with one 
embodiment of the present invention. 

[0021] Fig. 10 represents the performance of different FD Stochastic Gradient (FD-SG) 
algorithms; (a) Stationary speech-like noise at 90°; (b) Multi-talker babble noise at 90° in 
accordance with one embodiment of the present invention. 

[0022] Fig. 1 1 represents the influence of the LP filter on performance of FD stochastic 
gradient SP-SDW-MWF (1/^=0.5) without w 0 and with w 0 . Babble noise at 90° in 
accordance with one embodiment of the present invention. 

[0023] Fig. 12 represents the convergence behaviour of FD-SG for 1=0 and 1=0.9998. 
The noise source position suddenly changes from 90° to 180° and vice versa in accordance 
with one embodiment of the present invention. 

[0024] Fig. 13 represents the performance of FD stochastic gradient implementation of 
SP-SDW-MWF with LP filter (1=0.9998) in a multiple noise source scenario in accordance 
with one embodiment of the present invention. 

[0025] Fig. 14 represents the performance of FD SPA in a multiple noise source scenario 
in accordance with one embodiment of the present invention. 
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[0026] Fig. 15 represents the SNR improvement of the frequency-domain SP-SDW-MWF 
(Algorithm 2 and Algorithm 4) in a multiple noise source scenario in accordance with one 
embodiment of the present invention. 

[0027] Fig. 16 represents the speech distortion of the frequency-domain SP-SDW-MWF 
(Algorithm 2 and Algorithm 4) in a multiple noise source scenario in accordance with one 
embodiment of the present invention. 

DETAILED DESCRIPTION 
[0028] In speech communication applications, such as teleconferencing, hands-free 
telephony and hearing aids, the presence of background noise may significantly reduce the 
intelligibility of the desired speech signal. Hence, the use of a noise reduction algorithm is 
necessary. Multi-microphone systems exploit spatial information in addition to temporal and 
spectral information of the desired signal and noise signal and are thus preferred to single 
microphone procedures. Because of aesthetic reasons, multi-microphone techniques for e.g., 
hearing aid applications go together with the use of small-sized arrays. Considerable noise 
reduction can be achieved with such arrays, but at the expense of an increased sensitivity to 
errors in the assumed signal model such as microphone mismatch, reverberation, ... (see e.g. 
Stadler & Rabinowitz, 'On the potential of fixed arrays for hearing aids ', J. Acoust. Soc. 
Amer., vol. 94, no. 3, pp. 1332-1342, Sep. 1993) In hearing aids, microphones are rarely 
matched in gain and phase. Gain and phase differences between microphone characteristics 
can amount up to 6 dB and 10°, respectively. 

[0029] A widely studied multi-channel adaptive noise reduction algorithm is the 

Generalised Sidelobe Canceller (GSC) (see e.g. Griffiths &Jim, An alternative approach to 
linearly constrained adaptive beamforming', IEEE Trans. Antennas Propag, vol. 30, no. 1, 
pp. 27-34, Jan. 1982 and US5473701 Adaptive microphone array"). The GSC consists of a 
fixed, spatial pre-processor, which includes a fixed beamformer and a blocking matrix, and 
an adaptive stage based on an Adaptive Noise Canceller (ANC). The ANC minimizes the 
output noise power while the blocking matrix should avoid speech leakage into the noise 
references. The standard GSC assumes the desired speaker location, the microphone 
characteristics and positions to be known, and reflections of the speech signal to be absent. If 
these assumptions are fulfilled, it provides an undistorted enhanced speech signal with 
minimum residual noise. However, in reality these assumptions are often violated, resulting 
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in so-called speech leakage and hence speech distortion. To limit speech distortion, the ANC 
is typically adapted during periods of noise only. When used in combination with small-sized 
arrays, e.g., in hearing aid applications, an additional robustness constraint (see Cox et ah, 
'Robust adaptive beamforming', IEEE Trans. Acoust. Speech and Signal Processing', vol. 35, 
no. 10, pp. 1365-1376, Oct. 1987) is required to guarantee performance in the presence of 
small errors in the assumed signal model, such as microphone mismatch. A widely applied 
method consists of imposing a Quadratic Inequality Constraint to the ANC (QIC-GSC). For 
Least Mean Squares (LMS) updating, the Scaled Projection Algorithm (SPA) is a simple and 
effective technique that imposes this constraint. However, using the QIC-GSC goes at the 
expense of less noise reduction. 

[0030] A Multi-channel Wiener Filtering (MWF) technique has been proposed (see 

Doclo & Moonen, 'GSVD-based optimal filtering for single and multimicrophone speech 
enhancement', IEEE Trans. Signal Processing, vol. 50, no. 9, pp. 2230-2244, Sep. 2002) that 
provides a Minimum Mean Square Error (MMSE) estimate of the desired signal portion in 
one of the received microphone signals. In contrast to the ANC of the GSC, the MWF is able 
to take speech distortion into account in its optimisation criterion, resulting in the Speech 
Distortion Weighted Multi-channel Wiener Filter (SDW-MWF). The (SDW-)MWF 
technique is uniquely based on estimates of the second order statistics of the recorded speech 
signal and the noise signal. A robust speech detection is thus again needed. In contrast to the 
GSC, the (SDW-)MWF does not make any a priori assumptions about the signal model such 
that no or a less severe robustness constraint is needed to guarantee performance when used 
in combination with small-sized arrays. Especially in complicated noise scenarios such as 
multiple noise sources or diffuse noise, the (SDW-)MWF outperforms the GSC, even when 
the GSC is supplemented with a robustness constraint. 

[0031] A possible implementation of the (SDW-)MWF is based on a Generalised 

Singular Value Decomposition (GSVD) of an input data matrix and a noise data matrix. A 
cheaper alternative based on a QR Decomposition (QRD) has been proposed in Rombouts & 
Moonen, 'QRD-based unconstrained optimal filtering for acoustic noise reduction ', Signal 
Processing, vol. 83, no. 9, pp. 1889-1904, Sep. 2003. Additionally, a subband implementation 
results in improved intelligibility at a significantly lower cost compared to the fullband 
approach. However, in contrast to the GSC and the QIC-GSC, no cheap stochastic gradient 
based implementation of the (SDW-)MWF is available yet. In Nordholm et al, 'Adaptive 
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microphone array employing calibration signals: an analytical evaluation', IEEE Trans. 
Speech, Audio Processing, vol. 7, no. 3, pp. 241-252, May 1999, an LMS based algorithm for 
the MWF has been developed. However, said algorithm needs recordings of calibration 
signals. Since room acoustics, microphone characteristics and the location of the desired 
speaker change over time, frequent re-calibration is required, making this approach 
cumbersome and expensive. Also an LMS based SDW-MWF has been proposed that avoids 
the need for calibration signals (see Florencio & Malvar, 'Multichannel filtering for optimum 
noise reduction in microphone arrays', Int. Conf. on Acoust., Speech, and Signal Proc, Salt 
Lake City, USA, pp. 197-200, May 2001). This algorithm however relies on some 
independence assumptions that are not necessarily satisfied, resulting in degraded 
performance. 

[0032] The GSC and MWF techniques are now presented more in detail. 
Generalized Sidelobe Canceller (GSC) 

[0033] Fig. 1 describes the concept of the Generalized Sidelobe Canceller (GSC), which 
consists of a fixed, spatial pre-processor, i.e. a fixed beamformer A(z) and a blocking matrix 
B(z), and an ANC. Given M microphone signals 

Uj[k] = u"[k] + u"[k], i = l,...,M (equation 1) 

with uj[k] the desired speech contribution and u"[k] the noise contribution, the fixed 
beamformer A(z) (e.g. delay-and-sum) creates a so-called speech reference 

y 0 l k l = J'SW + JcW (equation 2) 

by steering a beam towards the direction of the desired signal, and comprising a speech 
contribution v ( ,[/c] and a noise contribution y' 0 [k]. The blocking matrix B(z) creates M-l so- 
called noise references 

y,[k] = y-[k] + y"[k], i = \,...,M-\ (equation 3) 
by steering zeroes towards the direction of the desired signal source such that the noise 
contributions y"[k] are dominant compared to the speech leakage contributions y-[k] . In the 
sequel, the superscripts s and n are used to refer to the speech and the noise contribution of a 
signal. During periods of speech + noise, the references y^k], i=0...M-l contain speech + 
noise. During periods of noise only, the references only consist of a noise component, i.e. 
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y,[k] = y"[k]. The second order statistics of the noise signal are assumed to be quite 
stationary such that they can be estimated during periods of noise only. 

[0034] To design the fixed, spatial pre-processor, assumptions are made about the 
microphone characteristics, the speaker position and the microphone positions and 
furthermore reverberation is assumed to be absent. If these assumptions are satisfied, the 
noise references do not contain any speech, i.e., y-[k] = 0, for i=l,..., M-1. However, in 
practice, these assumptions are often violated (e.g. due to microphone mismatch and 
reverberation) such that speech leaks into the noise references. To limit the effect of such 
speech leakage, the ANC filter w ]:M _, e c (M ~ 1)ixl 

<m-i=[< wf ... <_,] (equation 4) 

where 

w,=[w,[0] ... w,[L-l]\ T , (equation 5) 

with L the filter length, is adapted during periods of noise only. (Note that in a time-domain 
implementation the input signals of the adaptive filter w 1:M -i and the filter wim-i are real. In 
the sequel the formulas are generalised to complex input signals such that they can also be 
applied to a subband implementation.) Hence, the ANC filter wi.m-i minimises the output 
noise power, i.e. 

w i:m-i = ar § ™ n E i\y"o [* - A l - w L-i i k K.M-i i k f} (equation 6) 

leading to 

=^{y^_ 1 [*]yS?_ 1 [*]}- 1 ^{y^_ 1 [W*[*-A]}, (equation 7) 

where 

y^_ 1 W = [yr i/ [A:] y" 2 ' H [k] ... y^[/c]] (equations) 
y"[k] = [y"[k] y"[k-l] ... y?[k-L + l]J (equation 9) 
and where A is a delay applied to the speech reference to allow for non-causal taps in the 
filter wi : m-i. The delay A is usually set to [-f], where [x] denotes the smallest integer equal 
to or larger thanx. The subscript 1:M-1 in w 1:M -i and j> 7 .-m-7 refers to the subscripts of the first 
and the last channel component of the adaptive filter and input vector, respectively. 
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[0035] Under ideal conditions (y-[k] = 0, i = 1,...,M-1), the GSC minimises the residual 
noise while not distorting the desired speech signal, i.e. z s [k] = y s 0 [k-A]. However, when 
used in combination with small-sized arrays, a small error in the assumed signal model 
(resulting in y-[k]*0, i = \,...,M -\) already suffices to produce a significantly distorted 
output speech signal z s [k] 

z s [k] = y s 0 [k-A]-w" M _y i:M _ l [kl (equation 10) 

even when only adapting during noise-only periods, such that a robustness constraint on w 1:M - 
i is required. In addition, the fixed beamformer A(z) should be designed such that the 
distortion in the speech reference y s 0 [k] is minimal for all possible model errors. In the 
sequel, a delay-and-sum beamformer is used. For small-sized arrays, this beamformer offers 
sufficient robustness against signal model errors, as it minimises the noise sensitivity. The 
noise sensitivity is defined as the ratio of the spatially white noise gain to the gain of the 
desired signal and is often used to quantify the sensitivity of an algorithm against errors in the 
assumed signal model. When statistical knowledge is given about the signal model errors that 
occur in practice, the fixed beamformer and the blocking matrix can be further optimised. 

[0036] A common approach to increase the robustness of the GSC is to apply a Quadratic 
Inequality Constraint (QIC) to the ANC filter w 1:M -i, such that the optimisation criterion (eq. 
6) of the GSC is modified into 

w 1:M _! =argmin^!|v;'[/c-A]-w^ / _ 1 [/c]y; ! ; , / _ ] [/c]|") . 

w «-i 1 1 (equation 11) 

subject to wf M _,w 1:M _ 1 < p 2 . 

The QIC avoids excessive growth of the filter coefficients wum-l Hence, it reduces the 

undesired speech distortion when speech leaks into the noise references. 

The QIC-GSC can be implemented using the adaptive scaled projection algorithm (SPA)_: at 

each update step, the quadratic constraint is applied to the newly obtained ANC filter by 

scaling the filter coefficients by when w^ / _ 1 w 1:M _ 1 exceeds B 2 . Recently, Tian et al. 

implemented the quadratic constraint by using variable loading ( 'Recursive least squares 
implementation for LCMP Beamforming under quadratic constraint', IEEE Trans. Signal 
Processing, vol. 49, no. 6, pp. 1138-1145, June 2001). For Recursive Least Squares (RLS), 
this technique provides a better approximation to the optimal solution (eq. 1 1) than the scaled 
projection algorithm. 
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Multi-Channel Wiener Filtering (MWF) 

[0037] The Multi-channel Wiener filtering (MWF) technique provides a Minimum Mean 
Square Error (MMSE) estimate of the desired signal portion in one of the received 
microphone signals. In contrast to the GSC, this filtering technique does not make any a 
priori assumptions about the signal model and is found to be more robust. Especially in 
complex noise scenarios such as multiple noise sources or diffuse noise, the MWF 
outperforms the GSC, even when the GSC is supplied with a robustness constraint. 

[0038] The MWF Wl:M e C MLxl minimises the Mean Square Error (MSE) between a 
delayed version of the (unknown) speech signal u^k-A] at the z'-th (e.g. first) microphone 
and the sum wS/U 1:i/ [/c] of the A/filtered microphone signals, i.e. 

wi:m = argmm£{| M f[A:-A]-wS/U 1:M [A:]| 2 j, (equation 12) 

leading to 

wi:m = E {u 1:M [&]u^ [k] }~ l E {u 1:M [k ]u°'* [k - A] }, (equation 13) 

with 

wS# = [wf wf ••• wm], (equation 14) 

= [uf[*] u"[k] - <[£]], (equation 15) 
u,.[/c] = [«.[£] u t [k-l] ■■■ u t [k-L + l]] T . (equation 16) 
where u;[k] comprise a speech component and a noise component. 

[0039] An equivalent approach consists in estimating a delayed version of the (unknown) 
noise signal u"[k-A] in the z'-th microphone, resulting in 

w 1:M =argmin£ , ||M j "[/c-A]-w^ l/ u 1 . M [A:]| 2 |, (equation 17) 

and 

w 1:M =E{u 1 . M [k]u^ M [k]} 1 E{u 1:M [kX'[k- A]}, (equation 18) 

where 

w L=[ w f w f ••• W S]- (equation 19) 
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The estimate z[k] of the speech component u.[k-A] is then obtained by subtracting the 
estimate w^ M u 1:M [&] of u"[k-A] from the delayed, z'-th microphone signal w.[£-A], i.e. 

z[k] = Ui [k- A]- w? M u 1:M [fc]. (equation 20) 

This is depicted in Fig. 2 for u"[k-A] = u"[k-A]. 
[0040] The residual error energy of the MWF equals 

E{\ e[k] | 2 } = E{\u°[k - A]- ^ [kf}, (equation 21) 
and can be decomposed into 

E{\u°[k- A]-w>; : m Wf} + E^/"mKm Wfl (equation 22) 

where sj equals the speech distortion energy and a] the residual noise energy. The design 
criterion of the MWF can be generalised to allow for a trade-off between speech distortion 
and noise reduction, by incorporating a weighting factor // with /.< e [0,co] 

w,:m = arg mm E{\u°[k - A] - ^ M n s 1:M [kf} + ^{|w>^ [k]\\ (equation 23) 
The solution of (eq. 23) is given by 

wi:m = E{u s 1M [k]u s 1 %[k] + hKm^Km E{u\. M [k]u^[k - A]}, (equation 24) 

[0041] Equivalently, the optimisation criterion for wj : m-i in (eq. 17) can be modified into 

Wj. M = argmin£'{|wf M Uj. M [^]| 2 } + jj,E^u"[k- A]- w^u"^ [A:]| 2 }, (equation 25) 
resulting in 

Wl . M = E{u n VM [kW^ [k] + - u; M [k]u*£ [k]}' 1 E{K M [kK'[k - A]}, (equation 26) 

In the sequel, (eq. 26) will be referred to as the Speech Distortion Weighted Multi-channel 
Wiener Filter (SDW-MWF). 

The factor jj, e[0,co] trades off speech distortion versus noise reduction. If ju=l, the MMSE 
criterion (eq. 12) or (eq. 17) is obtained. If //> 1 , the residual noise level will be reduced at the 
expense of increased speech distortion. By setting ju to go, all emphasis is put on noise 
reduction and speech distortion is completely ignored. Setting fi to 0 on the other hand, 
results in no noise reduction. 
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[0042] In practice, the correlation matrix ii{Uj. M [A:]Uj:^[A:]} is unknown. During periods 
of speech, the inputs u^k] consist of speech + noise, i.e., u t [k] = u\ [k] + u"[k],i = \,...,M . 
During periods of noise, only the noise component u"[k] is observed. Assuming that the 
speech signal and the noise signal are uncorrelated, £'Juj. 1/ [/c]u 1 s .^[/c]! can be estimated as 

£{u; M [^]u;f[A:]} = £{u m [A:K M [A;]}- J e{u; :M [A;]u 1 ^[^]}, (equation27) 
where the second order statistics ^u^A^u^A:]} are estimated during speech + noise and 
the second order statistics E{u". M [k]u";^[k]} during periods of noise only. As for the GSC, a 
robust speech detection is thus needed. Using (eq. 27), (eq. 24) and (eq. 26) can be re-written 

(equation 28) 

and w 1:M =|^{u 1:M KW}+(l-VK M [^[i]}j E{nl M [k]ur[k-A]}. 

(equation 29) 

The Wiener filter may be computed at each time instant k by means of a Generalised 
Singular Value Decomposition (GS VD) of a speech + noise and noise data matrix. A cheaper 
recursive alternative based on a QR-decomposition is also available. Additionally, a subband 
implementation increases the resulting speech intelligibility and reduces complexity, making 
it suitable for hearing aid applications. 

[0043] The present invention is now described in detail. First, the proposed adaptive 
multi-channel noise reduction technique, referred to as Spatially Pre-processed Speech 
Distortion Weighted Multi-channel Wiener filter, is described. 

[0044] A first aspect of the invention is referred to as Speech Distortion Regularised GSC 
(SDR-GSC). A new design criterion is developed for the adaptive stage of the GSC: the ANC 
design criterion is supplemented with a regularisation term that limits speech distortion due to 
signal model errors. In the SDR-GSC, a parameter fi is incorporated that allows for a trade- 
off between speech distortion and noise reduction. Focusing all attention towards noise 
reduction, results in the standard GSC, while, on the other hand, focusing all attention 
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towards speech distortion results in the output of the fixed beamformer. In noise scenarios 
with low SNR, adaptivity in the SDR-GSC can be easily reduced or excluded by increasing 
attention towards speech distortion, i.e., by decreasing the parameter /u to 0. The SDR-GSC is 
an alternative to the QIC-GSC to decrease the sensitivity of the GSC to signal model errors 
such as microphone mismatch, reverberation,... In contrast to the QIC-GSC, the SDR-GSC 
shifts emphasis towards speech distortion when the amount of speech leakage grows. In the 
absence of signal model errors, the performance of the GSC is preserved. As a result, a better 
noise reduction performance is obtained for small model errors, while guaranteeing 
robustness against large model errors. 

[0045] In a next step, the noise reduction performance of the SDR-GSC is further 
improved by adding an extra adaptive filtering operation w 0 on the speech reference signal. 
This generalised scheme is referred to as Spatially Pre-processed Speech Distortion Weighted 
Multi-channel Wiener Filter (SP-SDW-MWF). The SP-SDW-MWF is depicted in Fig. 3 and 
encompasses the MWF as a special case. Again, a parameter ju is incorporated in the design 
criterion to allow for a trade-off between speech distortion and noise reduction. Focusing all 
attention towards speech distortion, results in the output of the fixed beamformer. Also here, 
adaptivity can be easily reduced or excluded by decreasing ju to 0. It is shown that -in the 
absence of speech leakage and for infinitely long filter lengths- the SP-SDW-MWF 
corresponds to a cascade of a SDR-GSC with a Speech Distortion Weighted Single-channel 
Wiener filter (SDW-SWF). In the presence of speech leakage, the SP-SDW-MWF with w 0 
tries to preserve its performance: the SP-SDW-MWF then contains extra filtering operations 
that compensate for the performance degradation due to speech leakage. Hence, in contrast to 
the SDR-GSC (and thus also the GSC), performance does not degrade due to microphone 
mismatch. Recursive implementations of the (SDW-)MWF exist that are based on a GSVD or 
QR decomposition. Additionally, a subband implementation results in improved intelligibility 
at a significantly lower complexity compared to the fullband approach. These techniques can 
be extended to implement the SDR-GSC and, more generally, the SP-SDW-MWF. 

[0046] In this invention, cheap time-domain and frequency-domain stochastic gradient 
implementations of the SDR-GSC and the SP-SDW-MWF are proposed as well. Starting 
from the design criterion of the SDR-GSC, or more generally, the SP-SDW-MWF, a time- 
domain stochastic gradient algorithm is derived. To increase the convergence speed and 
reduce the computational complexity, the algorithm is implemented in the frequency-domain. 
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To reduce the large excess error from which the stochastic gradient algorithm suffers when 
used in highly non-stationary noise, a low pass filter is applied to the part of the gradient 
estimate that limits speech distortion. The low pass filter avoids a highly time-varying 
distortion of the desired speech component while not degrading the tracking performance 
needed in time-varying noise scenarios. Experimental results show that the low pass filter 
significantly improves the performance of the stochastic gradient algorithm and does not 
compromise the tracking of changes in the noise scenario. In addition, experiments 
demonstrate that the proposed stochastic gradient algorithm preserves the benefit of the SP- 
SDW-MWF over the QIC-GSC, while its computational complexity is comparable to the 
NLMS based scaled projection algorithm for implementing the QIC. The stochastic gradient 
algorithm with low pass filter however requires data buffers, which results in a large memory 
cost. The memory cost can be decreased by approximating the regularisation term in the 
frequency-domain using (diagonal) correlation matrices, making an implementation of the 
SP-SDW-MWF in commercial hearing aids feasible both in terms of complexity as well as 
memory cost. Experimental results show that the stochastic gradient algorithm using 
correlation matrices has the same performance as the stochastic gradient algorithm with low 
pass filter. 

Spatially pre-processed SDW Multi-channel Wiener Filter 
Concept 

[0047] Fig. 3 depicts the Spatially pre-processed, Speech Distortion Weighted Multi- 
channel Wiener filter (SP-SDW-MWF). The SP-SDW-MWF consists of a fixed, spatial pre- 
processor, i.e. a fixed beamformer A(z) and a blocking matrix B(z), and an adaptive Speech 
Distortion Weighted Multi-channel Wiener filter (SDW-MWF). Given M microphone signals 

ii, [it] = u'[k] + u"[k\i = \,...,M (equation 30) 

with u*[k] the desired speech contribution and u"[k] the noise contribution, the fixed 
beamformer A(z) creates a so-called speech reference 

y 0 [k] = y s 0 [k] + y n 0 [kl (equation 31) 

by steering a beam towards the direction of the desired signal, and comprising a speech 
contribution y s 0 [k] and a noise contribution y^[k]. To preserve the robustness advantage of 
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the MWF, the fixed beamformer A(z) should be designed such that the distortion in the 
speech reference y s 0 [k] is minimal for all possible errors in the assumed signal model such as 
microphone mismatch. In the sequel, a delay-and-sum beamformer is used. For small-sized 
arrays, this beamformer offers sufficient robustness against signal model errors as it 
minimises the noise sensitivity. Given statistical knowledge about the signal model errors that 
occur in practice, a further optimised filter-and-sum beamformer A(z) can be designed. The 
blocking matrix B(z) creates M-l so-called noise references 

y i [k] = y°[k] + y?[k], i = \,...,M-\ (equation 32) 
by steering zeroes towards the direction of interest such that the noise contributions y"[k] are 
dominant compared to the speech leakage contributions y.[k]. A simple technique to create 
the noise references consists of pairwise subtracting the time-aligned microphone signals. 
Further optimised noise references can be created, e.g. by minimising speech leakage for a 
specified angular region around the direction of interest instead of for the direction of interest 
only (e.g. for an angular region from -20° to 20° around the direction of interest). In addition, 
given statistical knowledge about the signal model errors that occur in practice, speech 
leakage can be minimised for all possible signal model errors. 

[0048] In the sequel, the superscripts s and n are used to refer to the speech and the noise 
contribution of a signal. During periods of speech + noise, the references y,[k], 
z = 0,...,M-1 contain speech + noise. During periods of noise only, y t [k], i=0,...,M-l only 
consist of a noise component, i.e. j,[A:] = y"[k]. The second order statistics of the noise 
signal are assumed to be quite stationary such that they can be estimated during periods of 
noise only. 

[0049] The SDW-MWF filter w 0:M -i 

w 0!tf _ 1 =^£{yi Jf _ 1 [*]yS2_ 1 [*]}+£{y^_ 1 [*]y^_ 1 [*]}j ^yJ^EWt*-*]}, 

(equation 33) 

with 

<m-iOT = [<M wf[*] ... K_ml (equation 34) 
w,.[£] = [w,[0] w,.[l] ... w,[L-l]] T (equation 35) 
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y£*-i[*] = [y?[*] yfik] ... y£_i[*]]. (equation 36) 
y .[k] = [y.[k] y,[k-l] ... y,[k-L + l]f, (equation 37) 

provides an estimate w^ M _jy 0:M _j [A:] of the noise contribution y^[k-A] in the speech 

reference by minimising the cost function J(w 0: m-i) 

(equation 38) 

The subscript 0:M-1 in w 0:M -i and j 0 .m-7 refers to the subscripts of the first and the last channel 
component of the adaptive filter and the input vector, respectively. The term s] represents 
the speech distortion energy and s K 2 the residual noise energy. The term ±e] in the cost 
function (eq.38) limits the possible amount of speech distortion at the output of the SP-SDW- 
MWF. Hence, the SP-SDW-MWF adds robustness against signal model errors to the GSC by 
taking speech distortion explicitly into account in the design criterion of the adaptive stage. 
The parameter J- e [0,<x>) trades off noise reduction and speech distortion: the larger l/ju, the 
smaller the amount of possible speech distortion. For ju=0, the output of the fixed 
beamformer A(z), delayed by A samples is obtained. Adaptivity can be easily reduced or 
excluded in the SP-SDW-MWF by decreasing ju to 0 (e.g., in noise scenarios with very low 
signal-to-noise Ratio (SNR), e.g., -10 dB, a fixed beamformer may be preferred.) 
Additionally, adaptivity can be limited by applying a QIC to Wo.m-i- 

[0050] Note that when the fixed beamformer A(z) and the blocking matrix B(z) are set to 



A(z)= 1 0 



B(z) = 



0 1 

0 



0]" 



1 0 
0 1 



(equation 39) 



(equation 40) 



one obtains the original SDW-MWF that operates on the received microphone signals 
u,[kl i = l,...,M. 

[0051] Below, the different parameter settings of the SP-SDW-MWF are discussed. 
Depending on the setting of the parameter ju and the presence or the absence of the filter w 0 , 
the GSC, the (SDW-)MWF as well as in-between solutions such as the Speech Distortion 
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Regularised GSC (SDR-GSC) are obtained. One distinguishes between two cases, i.e. the 
case where no filter w 0 is applied to the speech reference (filter length Z 0 =0) and the case 
where an additional filter w 0 is used (Z#0). 

SDR-GSC. i.e.. SP-SDW-MWF without w n 

[0052] First, consider the case without w 0 , i.e. L 0 =0. The solution for w 1:M _! in (eq.33) 
then reduces to 

^^^{Iw^y^t^lV^^^-Al-w^y^tA:]! 2 }, (equation 41) 
leading to 

(equation 42) 

where s] is the speech distortion energy and s] the residual noise energy. 

[0053] Compared to the optimisation criterion (eq. 6) of the GSC, a regularisation term 

^£{|<M-iyL,-il>]| 2 } (equation 43) 

has been added. This regularisation term limits the amount of speech distortion that is caused 
by the filter wi.m-i when speech leaks into the noise references, i.e. y.[k] * 0, i = \,...,M -1 . 
In the sequel, the SP-SDW-MWF with L 0 =0 is therefore referred to as the Speech Distortion 
Regularized GSC (SDR-GSC) . The smaller jx, the smaller the resulting amount of speech 
distortion will be. For /u=0, all emphasis is put on speech distortion such that z[k] is equal to 
the output of the fixed beamformer A(z) delayed by A samples. For fi=co all emphasis is put 
on noise reduction and speech distortion is not taken into account. This corresponds to the 
standard GSC. Hence, the SDR-GSC encompasses the GSC as a special case. 

[0054] The regularisation term (eq. 43) with 1/^0 adds robustness to the GSC, while not 
affecting the noise reduction performance in the absence of speech leakage: 

In the absence of speech leakage, i.e., y-[k] = 0, i = \,...,M -\, the regularisation 

term equals 0 for all Wi-m-i and hence the residual noise energy s 2 n is effectively 



17 



Atty. Docket No. 22409-00388-US/ Customer No. 30,678 Client Ref. No. CID 3 1 1 US 

minimised. In other words, in the absence of speech leakage, the GSC solution is 
obtained. 

In the presence of speech leakage, i.e., y°[k]j=0, i = 1,...,M-1, speech distortion is 
explicitly taken into account in the optimisation criterion (eq.41) for the adaptive filter 
wi.m-i, limiting speech distortion while reducing noise. The larger the amount of 
speech leakage, the more attention is paid to speech distortion. 
To limit speech distortion alternatively, a QIC is often imposed on the filter wi :M -i. In contrast 
to the SDR-GSC, the QIC acts irrespective of the amount of speech leakage y s [k] that is 
present. The constraint value [f in (eq. 1 1) has to be chosen based on the largest model errors 
that may occur. As a consequence, noise reduction performance is compromised even when 
no or very small model errors are present. Hence, the QIC is more conservative than the 
SDR-GSC, as will be shown in the experimental results. 

SP-SDW-MWF with filter w n 

[0055] Since the SDW-MWF (eq.33) takes speech distortion explicitly into account in its 
optimisation criterion, an additional filter wo on the speech reference y 0 [k] may be added. 
The SDW-MWF (eq.33) then solves the following more general optimisation criterion 




(equation 44) 

where w^ M _j=[wf wf M _J is given by (eq.33). 

[0056] Again, ju trades off speech distortion and noise reduction. For ju=co speech 
distortion e] is completely ignored, which results in a zero output signal. For /u=0 all 
emphasis is put on speech distortion such that the output signal is equal to the output of the 
fixed beamformer delayed by A samples. 
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In addition, the observation can be made that in the absence of speech leakage, i.e., 
y'[k] = 0, i=l,...,M-l and for infinitely long filters w h i=0,...,M-l, the SP-SDW-MWF 
(with wo) corresponds to a cascade of an SDR-GSC and an SDW single-channel WF (SDW- 
SWF) postfilter. In the presence of speech leakage, the SP-SDW-MWF (with wo) tries to 
preserve its performance: the SP-SDW-MWF then contains extra filtering operations that 
compensate for the performance degradation due to speech leakage. This is illustrated in Fig. 
4. It can e.g. be proven that, for infinite filter lengths, the performance of the SP-SDW-MWF 
(with wo) is not affected by microphone mismatch as long as the desired speech component at 
the output of the fixed beamformer^fz) remains unaltered. 

Experimental results 

[0057] The theoretical results are now illustrated by means of experimental results for a 
hearing aid application. First, the set-up and the performance measures used, are described. 
Next, the impact of the different parameter settings of the SP-SDW-MWF on the 
performance and the sensitivity to signal model errors is evaluated. Comparison is made with 
the QIC-GSC. 

[0058] Fig. 5 depicts the set-up for the experiments. A three-microphone Behind-The-Ear 
(BTE) hearing aid with three omnidirectional microphones (Knowles FG-3452) has been 
mounted on a dummy head in an office room. The interspacing between the first and the 
second microphone is about 1 cm and the interspacing between the second and the third 
microphone is about 1.5 cm. The reverberation time T 6 odB of the room is about 700 ms for a 
speech weighted noise. The desired speech signal and the noise signals are uncorrelated. Both 
the speech and the noise signal have a level of 70 dB SPL at the centre of the head. The 
desired speech source and noise sources are positioned at a distance of 1 meter from the head: 
the speech source in front of the head (0°), the noise sources at an angle 9 w.r.t. the speech 
source (see also Fig. 5). To get an idea of the average performance based on directivity only, 
stationary speech and noise signals with the same, average long-term power spectral density 
are used. The total duration of the input signal is 10 seconds of which 5 seconds contain noise 
only and 5 seconds contain both the speech and the noise signal. For evaluation purposes, the 
speech and the noise signal have been recorded separately. 
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[0059] The microphone signals are pre-whitened prior to processing to improve 
intelligibility, and the output is accordingly de-whitened. In the experiments, the microphones 
have been calibrated by means of recordings of an anechoic speech weighted noise signal 
positioned at 0°, measured while the microphone array is mounted on the head. A delay-and- 
sum beamformer is used as a fixed beamformer, since -in case of small microphone 
interspacing - it is known to be very robust to model errors. The blocking matrix B pairwise 
subtracts the time aligned calibrated microphone signals. 

[0060] To investigate the effect of the different parameter settings (i.e. ju, w 0 ) on the 
performance, the filter coefficients are computed using (eq.33) where Ely^^y^^} is 
estimated by means of the clean speech contributions of the microphone signals. In practice, 
^{yoiM-iyftM-il i s approximated using (eq. 27). The effect of the approximation (eq. 27) on 
the performance was found to be small (i.e. differences of at most 0.5 dB in intelligibility 
weighted SNR improvement) for the given data set. The QIC-GSC is implemented using 
variable loading RLS. The filter length L per channel equals 96. 

[0061] To assess the performance of the different approaches, the broadband intelligibility 
weighted SNR improvement is used, defined as 

ASNRmteiiig = X 7 /(SNR/.«t -SNR,-,in), (equation 45) 

where the band importance function /, expresses the importance of the z'-th one-third octave 
band with centre frequency f c for intelligibility, SNRi, m , is the output SNR (in dB) and 
SNRi, in is the input SNR (in dB) in the i-th one third octave band ('ANSI S3. 5-1997, American 
National Standard Methods for Calculation of the Speech Intelligibility Index'). The 
intelligibility weighted SNR reflects how much intelligibility is improved by the noise 
reduction algorithm, but does not take into account speech distortion. 

[0062] To measure the amount of speech distortion, we define the following intelligibility 
weighted spectral distortion measure 

SDinteiiig = 2 7 - s °i (equation 46) 

with SD, the average spectral distortion (dB) in z'-fh one-third band, measured as 

SD, = j 2 2 '^ |l01og 10 G\f)\dfl[{2 m -2- 1,6 )f;], (equation 47) 
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with G s (f) the power transfer function of speech from the input to the output of the noise 
reduction algorithm. To exclude the effect of the spatial pre-processor, the performance 
measures are calculated w.r.t. the output of the fixed beamformer. 

[0063] The impact of the different parameter settings for fi and wo on the performance of 
the SP-SDW-MWF is illustrated for a five noise source scenario. The five noise sources are 
positioned at angles 75°, 120°, 180°, 240°, 285° w.r.t. the desired source at 0°. To assess the 
sensitivity of the algorithm against errors in the assumed signal model, the influence of 
microphone mismatch, e.g., gain mismatch of the second microphone, on the performance is 
evaluated. Among the different possible signal model errors, microphone mismatch was 
found to be especially harmful to the performance of the GSC in a hearing aid application. In 
hearing aids, microphones are rarely matched in gain and phase. Gain and phase differences 
between microphone characteristics of up to 6 dB and 10°, respectively, have been reported. 

SP-SDW-MWF without w n (SDR-GSC) 

[0064] Fig. 6 plots the improvement ASNRinteiiig and the speech distortion SDmteiUg as a 
function of l/> obtained by the SDR-GSC (i.e., the SP-SDW-MWF without filter w 0 ) for 
different gain mismatches Y 2 at the second microphone. In the absence of microphone 
mismatch, the amount of speech leakage into the noise references is limited. Hence, the 
amount of speech distortion is low for all ju. Since there is still a small amount of speech 
leakage due to reverberation, the amount of noise reduction and speech distortion slightly 
decreases for increasing l/ju, especially for l/ju > 1. In the presence of microphone mismatch, 
the amount of speech leakage into the noise references grows. For 1/^=0 (GSC), the speech 
gets significantly distorted. Due to the cancellation of the desired signal, also the 
improvement ASNRi nt eiiig degrades. Setting l/ju>0 improves the performance of the GSC in 
the presence of model errors without compromising performance in the absence of signal 
model errors. For the given set-up, a value l/ju around 0.5 seems appropriate for guaranteeing 
good performance for a gain mismatch up to 4dB. 

SP-SDW-MWF with filter w n 

[0065] Fig. 7 plots the performance measures ASNRinteiiig and SD inte iiig of the SP-SDW- 
MWF with filter wo. In general, the amount of speech distortion and noise reduction grows 
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for decreasing For l/ju=0, all emphasis is put on noise reduction. As also illustrated by 
Fig. 7, this results in a total cancellation of the speech and the noise signal and hence 
degraded performance. In the absence of model errors, the settings L 0 =0 and L 0 ^0 result - 
except for l/ju=0 - in the same ASNR int eiiig, while the distortion for the SP-SDW-MWF with 
wo is higher due to the additional single-channel SDW-SWF. For L 0 ^0 the performance does 
-in contrast to L 0 =0- not degrade due to the microphone mismatch. 

[0066] Fig. 8 depicts the improvement ASNRinteiiig and the speech distortion SDmteiiig, 
respectively, of the QIC-GSC as a function of p 2 . Like the SDR-GSC, the QIC increases the 
robustness of the GSC. The QIC is independent of the amount of speech leakage. As a 
consequence, distortion grows fast with increasing gain mismatch. The constraint value P 
should be chosen such that the maximum allowable speech distortion level is not exceeded 
for the largest possible model errors. Obviously, this goes at the expense of reduced noise 
reduction for small model errors. The SDR-GSC on the other hand, keeps the speech 
distortion limited for all model errors (see Fig. 6). Emphasis on speech distortion is increased 
if the amount of speech leakage grows. As a result, a better noise reduction performance is 
obtained for small model errors, while guaranteeing sufficient robustness for large model 
errors. In addition, Fig. 7 demonstrates that an additional filter wo significantly improves the 
performance in the presence of signal model errors. 

[0067] In the previously discussed embodiments a generalised noise reduction scheme has 
been established, referred to as Spatially pre-processed, Speech Distortion Weighted Multi- 
channel Wiener Filter (SP-SDW-MWF), that comprises a fixed, spatial pre-processor and an 
adaptive stage that is based on a SDW-MWF. The new scheme encompasses the GSC and 
MWF as special cases. In addition, it allows for an in-between solution that can be interpreted 
as a Speech Distortion Regularised GSC (SDR-GSC). Depending on the setting of a trade-off 
parameter ju and the presence or absence of the filter w 0 on the speech reference, the GSC, the 
SDR-GSC or a (SDW-)MWF is obtained. The different parameter settings of the SP-SDW- 
MWF can be interpreted as follows: 

• Without wo, the SP-SDW-MWF corresponds to an SDR-GSC: the ANC 

design criterion is supplemented with a regularisation term that limits the speech 
distortion due to signal model errors. The larger l//u, the smaller the amount of 
distortion. For l//i=0, distortion is completely ignored, which corresponds to the 
GSC-solution. The SDR-GSC is then an alternative technique to the QIC-GSC to 
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decrease the sensitivity of the GSC to signal model errors. In contrast to the QIC- 
GSC, the SDR-GSC shifts emphasis towards speech distortion when the amount of 
speech leakage grows. In the absence of signal model errors, the performance of the 
GSC is preserved. As a result, a better noise reduction performance is obtained for 
small model errors, while guaranteeing robustness against large model errors. 
• Since the SP-SDW-MWF takes speech distortion explicitly into account, 

a filter w 0 on the speech reference can be added. It can be shown that -in the absence 
of speech leakage and for infinitely long filter lengths- the SP-SDW-MWF 
corresponds to a cascade of an SDR-GSC with an SDW-SWF postfilter. In the 
presence of speech leakage, the SP-SDW-MWF with w 0 tries to preserve its 
performance: the SP-SDW-MWF then contains extra filtering operations that 
compensate for the performance degradation due to speech leakage. In contrast to the 
SDR-GSC (and thus also the GSC), the performance does not degrade due to 
microphone mismatch. 

Experimental results for a hearing aid application confirm the theoretical results. The SP- 
SDW-MWF indeed increases the robustness of the GSC against signal model errors. A 
comparison with the widely studied QIC-GSC demonstrates that the SP-SDW-MWF achieves 
a better noise reduction performance for a given maximum allowable speech distortion level. 

Stochastic gradient implementations 

[0068] Recursive implementations of the (SDW-)MWF have been proposed based on a 
GSVD or QR decomposition. Additionally, a subband implementation results in improved 
intelligibility at a significantly lower cost compared to the fullband approach. These 
techniques can be extended to implement the SP-SDW-MWF. However, in contrast to the 
GSC and the QIC-GSC, no cheap stochastic gradient based implementation of the SP-SDW- 
MWF is available. In the present invention, time-domain and frequency-domain stochastic 
gradient implementations of the SP-SDW-MWF are proposed that preserve the benefit of 
matrix-based SP-SDW-MWF over QIC-GSC. Experimental results demonstrate that the 
proposed stochastic gradient implementations of the SP-SDW-MWF outperform the SPA, 
while their computational cost is limited. 

[0069] Starting from the cost function of the SP-SDW-MWF, a time-domain stochastic 
gradient algorithm is derived. To increase the convergence speed and reduce the 
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computational complexity, the stochastic gradient algorithm is implemented in the frequency- 
domain. Since the stochastic gradient algorithm suffers from a large excess error when 
applied in highly time-varying noise scenarios, the performance is improved by applying a 
low pass filter to the part of the gradient estimate that limits speech distortion. The low pass 
filter avoids a highly time-varying distortion of the desired speech component while not 
degrading the tracking performance needed in time-varying noise scenarios. Next, the 
performance of the different frequency-domain stochastic gradient algorithms is compared. 
Experimental results show that the proposed stochastic gradient algorithm preserves the 
benefit of the SP-SDW-MWF over the QIC-GSC. Finally, it is shown that the memory cost 
of the frequency-domain stochastic gradient algorithm with low pass filter is reduced by 
approximating the regularisation term in the frequency-domain using (diagonal) correlation 
matrices instead of data buffers. Experiments show that the stochastic gradient algorithm 
using correlation matrices has the same performance as the stochastic gradient algorithm with 
low pass filter. 

Stochastic gradient algorithm 

Derivation 

[0070] A stochastic gradient algorithm approximates the steepest descent algorithm, using 
an instantaneous gradient estimate. Given the cost function (eq.38), the steepest descent 
algorithm iterates as follows (note that in the sequel the subscripts 0:M-1 in the adaptive filter 
wo:m-i and the input vector y 0: M-i are omitted for the sake of conciseness): 

= w[«] + p ^{yTO jr - A] } - 

(equation 48) 

with w[k], y[k] e C NLxl , where N denotes the number of input channels to the adaptive filter 
and L the number of filter taps per channel. Replacing the iteration index n by a time index k 
and leaving out the expectation values E{.}, one obtains the following update equation 
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Yf[k + 1] = y,[k] + p y"[k](y" 0 '[k - A]-y" M [kMk]) f[k]f' H [kMk] • 

V I 

[ r[*] J 

(equation 49) 

For l//u=0 and no filter w 0 on the speech reference, (eq.49) reduces to the update formula 
used in GSC during periods of noise only (i.e., when y i [k] = y"[k\ i = 0,...,M -1). The 
additional term r[k] in the gradient estimate limits the speech distortion due to possible signal 
model errors. 

[0071] Equation (49) requires knowledge of the correlation matrix y s [k] y"' H [k] or 
E{y"[k]y"' H [k]} of the clean speech. In practice, this information is not available. To avoid 
the need for calibration, speech + noise signal vectors y buA are stored into a circular buffer 
Bj g R NxLb "fi during processing. During periods of noise only (i.e., when 
y t [k] = y"[k], i=0, ...,M-1), the filter w is updated using the following approximation of the 
term r[k] = jy s [k]y sM [kMk] in (eq.49) 

^yV'^WwOT^^y^y^OT-yy^OTjwOT, (equation 50) 
which results in the update formula 

w[k + 1] = w[*] + p y[k]( y 0 [k-A]- y H [kMk]) - i(y K [k]yi A [k] - y[k]y H [k]) w[k] . (equ 

I r w J 

ation 51) 

In the sequel, a normalised step size p is used, i.e. 




(equation 52) 



where 8 is a small positive constant. The absolute value Jy^y^ -y"y| has been inserted to 
guarantee a positive valued estimate of the clean speech energy y s ' H [k]y s [k] . Additional 
storage of noise only vectors y buh in a second buffer B 2 e R MxL "* allows to adapt w also 
during periods of speech + noise, using 
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w[* + 1] = w[*] + p |y 6a/2 [*]0^ [/c - A] - yl /2 [kMk]) + i(y ta/2 [t]y^ [k] - y[k]y H [k]) w[*]| 

(equation 53) 

with 



For reasons of conciseness only the update procedure of the time-domain stochastic gradient 
algorithms during noise only will be considered in the sequel, hence y[k]= y"[k]. The 
extension towards updating during speech + noise periods with the use of a second, noise 
only buffer B 2 is straightforward: the equations are found by replacing the noise-only input 
vector y[k] by y bufi [k] and the speech + noise vector y bufi [k] by the input speech + noise 
vector y[k]. 

It can be shown that the algorithm (eq.5 1 )-(cq.52) is convergent in the mean provided that the 
step size p is smaller than 2IX max with X max the maximum eigenvalue of 
E{jy bufi y bufi + (1 -j;)yy H } ■ The similarity of (eq.5 1) with standard NLMS let us presume that 
setting p < with h, i=l,...,NL the eigenvalues of £{^y ia/i yf^ + (1— ^yy*} e R NLxNL , 

or -in case of FIR filters- setting 

2 

P < m «n (equation 55) 

i^M-* ^W, [*]} + 0 - i^I^ E{yf[k]} 

guarantees convergence in the mean square. Equation (55) explains the normalisation (eq.52) 

and (eq.54) for the step size p. 

[0072] However, since generally 

y[k]y H [k] * yl fi (equation 56) 

the instantaneous gradient estimate in (eq.51) is -compared to (eq.49)- additionally perturbed 
by 

^{y[k]y H [k]-yl fi [k]y^[k])w[k], (equation 57) 

for l/ju^O. Hence, for l/ju^O, the update equations (eq.51)-(eq.54) suffer from a larger 
residual excess error than (eq.49). This additional excess error grows for decreasing ju, 
increasing step size p and increasing vector length LN of the vector y. It is expected to be 
especially large for highly non-stationary noise, e.g. multi-talker babble noise. 
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Remark that for fX>\, an alternative stochastic gradient algorithm can be derived from 
algorithm (eq.51)-(eq.54) by invoking some independence assumptions. Simulations, 
however, showed that these independence assumptions result in a significant performance 
degradation, while hardly reducing the computational complexity. 

Frequency-domain implementation 

[0073] As stated before, the stochastic gradient algorithm (eq.51)-(eq.54) is expected to 
suffer from a large excess error for large p'///. and/or highly time-varying noise, due to a large 
difference between the rank-one noise correlation matrices y"[k]y"' H [k] measured at 
different time instants k. The gradient estimate can be improved by replacing 

y bufl [k}y H buA [k]-y[k]y H [k] (equation 58) 

in (eq.51) with the time-average 

^ Z y^Myfj/]-^ t yUVUl (equation 59) 

l=k-K+l & l=k-K+\ 

where * 2t*-/M y *"/i^] y ^/i^] * s u Pd ate d during periods of speech + noise and 
'K^li=k-K+i^^ H ^ during periods of noise only. However, this would require expensive 
matrix operations. A block-based implementation intrinsically performs this averaging: 
w[(* + Y)K] = w[kK] + £ y[kK + i] (y' 0 [kK + i-A]-y" [kK + i]w[kK]) 

(y \- kK + f ]y^ + - yi kK + ^ H t kK + z 'fl w ^ 

(equation 60) 

The gradient and hence also y ia/i [^]yf </i [^]-y[A:]y ff [A:] is averaged over K iterations prior to 
making adjustments to w. This goes at the expense of a reduced (i.e. by a factor K) 
convergence rate. 

[0074] The block-based implementation is computationally more efficient when it is 
implemented in the frequency-domain, especially for large filter lengths : the linear 
convolutions and correlations can then be efficiently realised by FFT algorithms based on 
overlap-save or overlap-add. In addition, in a frequency-domain implementation, each 
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frequency bin gets its own step size, resulting in faster convergence compared to a time- 
domain implementation while not degrading the steady-state excess MSE. 

[0075] Algorithm 1 summarises a frequency-domain implementation based on overlap- 
save of (eq.51)-(eq.54). Algorithm 1 requires (3N+4) FFTs of length 2L. By storing the FFT- 
transformed speech + noise and noise only vectors in the buffers Bj e c NxLb " A and 
B 2 g C NxLb " f2 , respectively, instead of storing the time-domain vectors, N FFT operations can 
be saved. Note that since the input signals are real, half of the FFT components are complex- 
conjugated. Hence, in practice only half of the complex FFT components have to be stored in 
memory. When adapting during speech + noise, also the time-domain vector 

[y 0 [kL-A] ■■■ y 0 [kL-A + L-\]] T (equation 61) 
should be stored in an additional buffer B 2 0 e during periods of noise-only, which -for 
N=M- results in an additional storage of words compared to when the time-domain 
vectors are stored into the buffers Bi and B2. 

Remark that in Algorithm 1 a common trade-off parameter fi is used in all frequency bins. 
Alternatively, a different setting for ju can be used in different frequency bins. E.g. for SP- 
SDW-MWF with wo=0, l/ju could be set to 0 at those frequencies where the GSC is 
sufficiently robust, e.g., for small-sized arrays at high frequencies. In that case, only a few 
frequency components of the regularisation terms Rj[k], i=M-N,...,M-l, need to be computed, 
reducing the computational complexity. 
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Algorithm 1 : Frequency-domain stochastic gradient SP-SDW-MWF based on overlap-save 
Initialisation: 

W,[0] = [0 ••• Of, i = M-N,...,M-l 
P m [0] = 8 m , m = 0,...,2L-\ 

Matrix definitions: 

[1/ 0,1 

g = ; k= 0, 1,1; ¥ = 2Lx2L DFT matrix ; 

For each new block of NL input samples: 

♦ If noise detected: 

1. ¥[ yi [kL-L] ... y.[kL + L-l]f, / = A/ -1 ^ noise buffer B 2 
[y 0 [kL-A] ... i 0 [/cI-A + L-l]] r -> noise buffer B 20 

2. Y?[k] = diag\F[y i [kL-L] ... y i [kL + L-l]f^,i = M - N,...,M -I 

d[k] = [y 0 [kL-A] ••• y 0 [kL- A + Z-l]f 
Create Y;[k] from data in speech + noise buffer Bi. 

♦ If speech detected: 

1. ^[y^kL-L] ... yi [kL + L-l]] T ,i = M -N,...,M -1^ speech + noise buffer Bj 
2. Y,.[A:] = diag{r[j,.[^-L] ... y,[kL + L-l]f},i = M -N,...,M -1 

Create d[£] and Y; n [£] from noise buffer B 2 ,o and B 2 

♦ Update formula: 

1. e,[*] = kF X" ,; v Y;[*]W y [*] = y ouU 

e[k] = A[k]-e i [k] 

e 2 W = kF v Y y [*]W y [*] = y out , 2 

E,[*] = Fk\[k];E 2 [k] = Fk\[k] ; E[/c] = Fk r e[/t] 

2. A[k] = If diag f^ 1 [k\ . . ., F£_ , [*]} 

3 . W ; [k + 1] = W, [k] + FgF _1 A[A:] {\?' H [k]E[k] - ± (\"'E 2 [k] - Y?' H E, [it])} , 
29 
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(i=M-N, ... , M-l) 

♦ Output: y 0 [k] = [y 0 [kL - A] ••• y 0 [kL- A + L-l]f 

• If noise detected: y out [A:] = y 0 [A:]-y outl [A:] 

• If speech detected: y out [A:] = y 0 [k]-y out2 [k] 

Improvement 1: stochastic gradient algorithm with low pass filter 

[0076] For spectrally stationary noise, the limited (i.e. K=L) averaging of (eq.59) by the 
block-based and frequency-domain stochastic gradient implementation may offer a 
reasonable estimate of the short-term speech correlation matrix E{y s y s ' H }. However, in 
practical scenarios, the speech and the noise signals are often spectrally highly non-stationary 
(e.g. multi-talker babble noise) while their long-term spectral and spatial characteristics (e.g. 
the positions of the sources) usually vary more slowly in time. For these scenarios, a reliable 
estimate of the long-term speech correlation matrix E{y s y s ' H } that captures the spatial rather 
than the short-term spectral characteristics can still be obtained by averaging (eq.59) over 
K»L samples. Spectrally highly non-stationary noise can then still be spatially suppressed 
by using an estimate of the long-term speech correlation matrix in the regularisation term 
r[k]. A cheap method to incorporate a long-term averaging (K»L) of (eq.59) in the 
stochastic gradient algorithm is now proposed, by low pass filtering the part of the gradient 
estimate that takes speech distortion into account (i.e. the term r[k] in (eq.51)). The averaging 
method is first explained for the time-domain algorithm (eq.51)-(eq.54) and then translated to 
the frequency-domain implementation. 

Assume that the long-term spectral and spatial characteristics of the noise are quasi-stationary 
during at least K speech + noise samples and K noise samples. A reliable estimate of the long- 
term speech correlation matrix E{y"y s ' H } is then obtained by (eq.59) with K»L. To avoid 
expensive matrix computations, r[k] can be approximated by 

T 2 (y b uf l V]ytf l U]-yU]y H U])™m- (equation 62) 

l=k-K+\ 

Since the filter coefficients w of a stochastic gradient algorithm vary slowly in time, (eq.62) 

appears a good approximation of r[k], especially for small step size p'. 

The averaging operation (eq.62) is performed by applying a low pass filter to r[k] in (eq. 51): 
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r[k] = Xr[k - 1] + (1 - A) i(y Vi [k]y^ [k] - y[k]y H [k]) w[*l (equation 63) 

where A < 1 . This corresponds to an averaging window K of about -j-U samples. The 
normalised step size p is modified into 

p = §■ (equation 64) 

r avg [k] + y H [k]y[k] + S 

^W = A\j/c-l] + (l-A)^|y^ /i Wy iM/i W-y // WyW|. (equation 65) 

Compared to (eq.51), (eq.63) requires 3NL-1 additional MAC and extra storage of the NLxl 
vector r[k]. 

[0077] Equation (63) can be easily extended to the frequency-domain. The update 
equation for W,[k+1] in Algorithm 1 then becomes (Algorithm 2): 

W;[k + 1] = W;[k] + FgF- l A[k]{Y; M [k]E[k]-R,[k]y, 

R,.[k] = AR,.[A: - 1] + (1 - A)-( Y/ 7 [k]E 2 [k] - Y?' H [k]E, [k]) 

(equation 66) 



E[£] = Fk r ^[£]-kF-' J Y,"[*]W y [*]J; (equation 67) 
E 1 [A:] = Fk r kF" 1 X Y JW W ,W; (equation 68) 

E 2 [£] = Fk r kF- ] I] Y y [*]W y [*]. (equation 69) 

and/l[£:] computed as follows: 

A[k] = ^diagjp- 1 ^],...,^^^]) (equation 70) 
p m [k] = Y P m [k-l] + (l- r) (P Um [k] + P 2 Jk]) (equation 71) 
P lm [k]= £ | Y J, m W| 2 (equation 72) 

j=M-N 

P 2 [k] = XP 2 + Y [|y. [Arf-lY" [kfi (equation 73) 
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Compared to Algorithm 1, (eq.66)-(eq.69) require one extra 2Z-point FFT and 8NL-2N-2L 
extra MAC per L samples and additional memory storage of a 2NLxl real data vector. To 
obtain the same time constant in the averaging operation as in the time-domain version with 
K=l, X should equal The experimental results that follow will show that the performance 
of the stochastic gradient algorithm is significantly improved by the low pass filter, especially 
for large X. 

[0078] Now the computational complexity of the different stochastic gradient algorithms 
is discussed. Table 1 summarises the computational complexity (expressed as the number of 
real multiply-accumulates (MAC), divisions (D), square roots (Sq) and absolute values 
(Abs)) of the time-domain (TD) and the frequency-domain (FD) Stochastic Gradient (SG) 
based algorithms. Comparison is made with standard NLMS and the NLMS based SPA. One 
complex multiplication is assumed to be equivalent to 4 real multiplications and 2 real 
additions. A 2Z-point FFT of a real input vector requires 2Llog 2 2L real MAC (assuming a 
radix-2 FFT algorithm). 

Table 1 indicates that the TD-SG algorithm without filter w 0 and the SPA are about twice as 
complex as the standard ANC. When applying a Low Pass filter (LP) to the regularisation 
term, the TD-SG algorithm has about three times the complexity of the ANC. The increase in 
complexity of the frequency-domain implementations is less. 

Algorithm update formula step size adaptation 

TD NLMS ANC (2M-2)L + 1)MAC 1D + (M-1)ZMAC 

NLMS based SPA (4(M -l)L + l)MAC+lD+lSq 1D + (M-1)ZMAC 



SG 



(47VL + 5)MAC 



lD + lAbs + (2A^ + 2)MAC 



SG with LP 



(7/VL + 4)MAC 



1 D + 1 Abs + (2NL + 4) MAC 



FD 



NLMS ANC 



(10M-7-^^) + 



1D + (2M + 2)MAC 



NLMS based SPA 



(6M-2)log 2 2L MAC 
14M-ll-^^ + 



1D + (2M + 2)MAC 



(6M-2)log 2 2ZMAC 
+11 L Sq + l/ZD 



SG 



(18A/ + 6-^f) + 
(6iV + 8)log 2 2ZMAC 



lD + lAbs + (47V + 4)MAC 



(Algorithm 1) 
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SG with LP (26N + A-^f) ID + lAbs + {AN + 6) MAC 

(Algorithm 2) +(6N + 1 0) log 2 2L MAC 



[0079] As an illustration, Fig. 9 plots the complexity (expressed as the number of Mega 
operations per second (Mops)) of the time-domain and the frequency-domain stochastic 
gradient algorithm with LP filter as a function of L for M=3 and a sampling frequency f s =16 
kHz. Comparison is made with the NLMS-based ANC of the GSC and the SPA. The 
complexity of the FD SPA is not depicted, since for small M, it is comparable to the cost of 
the FD-NLMS ANC. For L>8, the frequency-domain implementations result in a 
significantly lower complexity compared to their time-domain equivalents. The 
computational complexity of the FD stochastic gradient algorithm with LP is limited, making 
it a good alternative to the SPA for implementation in hearing aids. 

In Table 1 and Fig. 9 the complexity of the time-domain and the frequency-domain NLMS 
ANC and NLMS based SPA represents the complexity when the adaptive filter is only 
updated during noise only. If the adaptive filter is also updated during speech + noise using 
data from a noise buffer, the time-domain implementations additionally require NL MAC per 
sample and the frequency-domain implementations additionally require 2 FFT and (4L(M-1)- 
2(M-1)+L) MAC per L samples. 

[0080] The performance of the different FD stochastic gradient implementations of the 
SP-SDW-MWF is evaluated based on experimental results for a hearing aid application. 
Comparison is made with the FD-NLMS based SPA. For a fair comparison, the FD-NLMS 
based SPA is -like the stochastic gradient algorithms- also adapted during speech + noise 
using data from a noise buffer. 

[0081] The set-up is the same as described before (see also Fig. 5). The performance of 
the FD stochastic gradient algorithms is evaluated for a filter length L=32 taps per channel, 
p'=0.8 and y=0. To exclude the effect of the spatial pre -processor, the performance measures 
are calculated w.r.t. the output of the fixed beamformer. The sensitivity of the algorithms 
against errors in the assumed signal model is illustrated for microphone mismatch, e.g. a gain 
mismatch Y 2 = 4 dB of the second microphone. 
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[0082] Fig. 10(a) and (b) compare the performance of the different FD Stochastic Gradient 
(SG) SP-SDW-MWF algorithms without w 0 (i.e., the SDR-GSC) as a function of the trade- 
off parameter ju for a stationary and a non-stationary (e.g. multi-talker babble) noise source, 
respectively, at 90°. To analyse the impact of the approximation (eq.50) on the performance, 
the result of a FD implementation of (eq.49), which uses the clean speech, is depicted too. 
This algorithm is referred to as optimal FD-SG algorithm. Without Low Pass (LP) filter, the 
stochastic gradient algorithm achieves a worse performance than the optimal FD-SG 
algorithm (eq.49), especially for large 1///. For a stationary speech-like noise source, the FD- 
SG algorithm does not suffer too much from approximation (eq.50). In a highly time-varying 
noise scenario, such as multi-talker babble, the limited averaging of r[k] in the FD 
implementation does not suffice to maintain the large noise reduction achieved by (eq.49). 
The loss in noise reduction performance could be reduced by decreasing the step size p\ at 
the expense of a reduced convergence speed. Applying the low pass filter (eq.66) with e.g. 
1=0.999 significantly improves the performance for all 1//j, while changes in the noise 
scenario can still be tracked. 

[0083] Fig. 1 1 plots the SNR improvement ASNRnteiiig and the speech distortion SDmtdiig 
of the SP-SDW-MWF (l///=0.5) with and without filter w 0 for the babble noise scenario as a 
function of where X is the exponential weighting factor of the LP filter (see (eq.66)). 
Performance clearly improves for increasing X. For small X, the SP-SDW-MWF with wo 
suffers from a larger excess error -and hence worse ASNR iri , e iii g - compared to the SP-SDW- 
MWF without wo. This is due to the larger dimensions of E{yy sH }. 

[0084] The LP filter reduces fluctuations in the filter weights W/[fc] caused by poor 
estimates of the short-term speech correlation matrix E{y s y s ' H } and/or by the highly non- 
stationary short-term speech spectrum. In contrast to a decrease in step size p\ the LP filter 
does not compromise tracking of changes in the noise scenario. As an illustration, Fig. 12 
plots the convergence behaviour of the FD stochastic gradient algorithm without w 0 (i.e. the 
SDR-GSC) for X=0 and X=0.9998, respectively, when the noise source position suddenly 
changes from 90° to 180°. A gain mismatch T 2 of 4 dB was applied to the second 
microphone. To avoid fast fluctuations in the residual noise energy e 2 n and the speech 
distortion energy a], the desired and the interfering noise source in this experiment are 
stationary, speech-like. The upper figure depicts the residual noise energy e] as a function of 



34 



Atty. Docket No. 22409-00388-US/ Customer No. 30,678 Client Ref. No. CID 3 1 1 US 
the number of input samples, the lower figure plots the residual speech distortion e] during 
speech + noise periods as a function of the number of speech + noise samples. Both 
algorithms (i.e., 1=0 and 1=0.9998) have about the same convergence rate. When the change 
in position occurs, the algorithm with 1=0.9998 even converges faster. For 1=0, the 
approximation error (eq.50) remains large for a while since the noise vectors in the buffer are 
not up to date. For 1=0.9998, the impact of the instantaneous large approximation error is 
reduced thanks to the low pass filter. 

[0085] Fig. 13 and Fig. 14 compare the performance of the FD stochastic gradient 
algorithm with LP filter (1=0.9998) and the FD-NLMS based SPA in a multiple noise source 
scenario. The noise scenario consists of 5 multi-talker babble noise sources positioned at 
angles 75°,120 o ,180 o ,240 o ,285° w.r.t. the desired source at 0°. To assess the sensitivity of the 
algorithms against errors in the assumed signal model, the influence of microphone 
mismatch, i.e. gain mismatch Y, = 4 dB of the second microphone, on the performance is 
depicted too. In Fig. 13, the SNR improvement ASNRi nt eiiig and the speech distortion SD int eiiig 
of the SP-SDW-MWF with and without filter wo is depicted as a function of the trade-off 
parameter l//u. Fig. 14 shows the performance of the QIC-GSC 

w"w < p 2 (equation 74) 

for different constraint values ft 2 , which is implemented using the FD-NLMS based SPA. 
The SPA and the stochastic gradient based SP-SDW-MWF both increase the robustness of 
the GSC (i.e., the SP-SDW-MWF without w 0 and l//i=0). For a given maximum allowable 
speech distortion SDmteUig, the SP-SDW-MWF with and without wo achieve a better noise 
reduction performance than the SPA. The performance of the SP-SDW-MWF with wo is -in 
contrast to the SP-SDW-MWF without wo- not affected by microphone mismatch. In the 
absence of model errors, the SP-SDW-MWF with w 0 achieves a slightly worse performance 
than the SP-SDW-MWF without w 0 . This can be explained by the fact that with w 0 , the 
estimate of ±E{y s y s ' H } is less accurate due to the larger dimensions of jE{y s y s ' H } (see also 
Fig. 11). In conclusion, the proposed stochastic gradient implementation of the SP-SDW- 
MWF preserves the benefit of the SP-SDW-MWF over the QIC-GSC. 

Improvement 2 ; frequency-domain stochastic gradient algorithm using correlation 
matrices 
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[0086] It is now shown that by approximating the regularisation term in the frequency- 
domain, (diagonal) speech and noise correlation matrices can be used instead of data buffers, 
such that the memory usage is decreased drastically, while also the computational complexity 
is further reduced. Experimental results demonstrate that this approximation results in a small 
-positive or negative- performance difference compared to the stochastic gradient algorithm 
with low pass filter, such that the proposed algorithm preserves the robustness benefit of the 
SP-SDW-MWF over the QIC-GSC, while both its computational complexity and memory 
usage are now comparable to the NLMS-based SPA for implementing the QIC-GSC. 

[0087] As the estimate of r[k] in (eq.51) proved to be quite poor, resulting in a large 
excess error, it was suggested in (eq. 59) to use an estimate of the average clean speech 
correlation matrix. This allows r[k] to be computed as 

r[k] = - (1 - A)£ t l (y Vl [/]y£i [/] - y" U]y"' H [/])■*[*], (equation 75) 

A 1 i=o 

with A an exponential weighting factor. For stationary noise a small A, i.e. 1/(1- A ) ~ NL, 
suffices. However, in practice the speech and the noise signals are often spectrally highly 
non-stationary (e.g. multi-talker babble noise), whereas their long-term spectral and spatial 
characteristics usually vary more slowly in time. Spectrally highly non-stationary noise can 
still be spatially suppressed by using an estimate of the long-term correlation matrix in r[k], 
i.e. 1/(1- A") » NL. In order to avoid expensive matrix operations for computing (eq.75), it 
was previously assumed that w[k] varies slowly in time, i.e. w[k]~w[l], such that (eq.75) can 
be approximated with vector instead of matrix operations by directly applying a low pass 
filter to the regularisation term r[k], cf. (eq.63), 

r[k] = -(l- A)t t l {y bufl U]yl fl [/] " y W" [/]) • w[/] (equation 76) 

^ 1=0 

= Ar[* - 1] + (1 - A) 1 (y ia/ , [k]yi A [k] - y W" [k]) w[k] . (equation 77) 

However, this assumption is actually not required in a frequency-domain implementation, as 
will now be shown. 

[0088] The frequency-domain algorithm called Algorithm 2 requires large data buffers 
and hence the storage of a large amount of data (note that to achieve a good performance, 
typical values for the buffer lengths of the circular buffers Bi and B 2 are 10000... 20000). A 
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substantial memory (and computational complexity) reduction can be achieved by the 
following two steps: 

• When using (eq.75) instead of (eq.77) for calculating the regularisation term, 
correlation matrices instead of data samples need to be stored. The frequency-domain 
implementation of the resulting algorithm is summarised in Algorithm 3, where 
2Zx2Z-dimensional speech and noise correlation matrices S^k] and 
S"j[k],i,j = M-N...M-\ are used for calculating the regularisation term R;[k] and 
(part of) the step size A[k]. These correlation matrices are updated respectively during 
speech + noise periods and noise only periods. When using correlation matrices, filter 
adaptation can only take place during noise only periods, since during speech + noise 
periods the desired signal cannot be constructed from the noise buffer B 2 anymore. 
This first step however does not necessarily reduce the memory usage (NL bu fi for data 
buffers vs. 2(NL) 2 for correlation matrices) and will even increase the computational 
complexity, since the correlation matrices are not diagonal. 

• The correlation matrices in the frequency-domain can be approximated by 
diagonal matrices, since Fk T kF' in Algorithm 3 can be well approximated by W2. 
Hence, the speech and the noise correlation matrices are updated as 

S..[£] = AS ..[£ - 1] + (1 - X)Y?[k]Yj[k]/2, (equation 78) 
S;.[k] = kS;.[k-l] + (l-X)Y?' H [k]Y][k]/2, (equation 79) 
leading to a significant reduction in memory usage and computational complexity, 
while having a minimal impact on the performance and the robustness. This algorithm 
will be referred to as Algorithm 4. 
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Algorithm 3 Frequency-domain implementation with correlation matrices (without 
approximation) 

Initialisation and matrix definitions: 

W,.[0] = [0 ••• 0f,i = M-N...M-l 

P m [0] = S m ,m = 0...2L-l 

¥ = 2Lx2L -dimensional DFT matrix 

gJ^ M k= r 0i 1 1 

0 L =LxL-dim. zero matrix, I L =LxL-dim. identity matrix 
For each new block of L samples (per channel): 

d[k] = [y 0 [kL-A] ••• y 0 [kL-A + L-l]f 

Y 1 [k] = dmg{F[y ! [kL-L] ■■■ y,[kL + L-l]] T },i = M -N..M -I 
Output signal: 

eM = d[^]-kF- 1 £ Y ;W W ;W- E[k] = ¥k T e[k] 

j=M-N 

If speech detected: 

S v [k] = (1 - A)£ X k ~ l Y? [IW^kF-'Yjil] = AS. .[A: - 1] + (1 - X)Y? [k]¥k T kF-%[k] 

1=0 

If noise detected: Y i [k] = Y"[k] 

S;[£] = (l-A)^A"Y/'"[/]Fk r kF-%^^ 

1=0 

Update formula (only during noise-only-periods): 

R ' [ * ] = ^ % [^j[k]-Sl[k]\W J .[kli = M-N...M-l 
W,[k + 1] = W t [k] + FgF _1 A[A:]{Y"^[A:]E[A:] - R t [k]} ,i = M - N...M - 1 

with 

A[k] = ^di a g{p; 1 [k],...,p;l_ 1 [k]} 

p m [k] = yP m [k-l] + (l-y) (P lm [k] + P 2m [k]),m = 0...2L-l 
P ^m= E fc m W|> 2 , m W = ^| E S M .Jk]-S;. m [k^m = 0...2L-l 
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[0089] Table 2 summarises the computational complexity and the memory usage of the 
frequency-domain NLMS-based SPA for implementing the QIC-GSC and the frequency- 
domain stochastic gradient algorithms for implementing the SP-SDW-MWF (Algorithm 2 
and Algorithm 4). The computational complexity is again expressed as the number of Mega 
operations per second (Mops), while the memory usage is expressed in kWords. The 
following parameters have been used: M=3, L=32, f s = 16kHz, L bufl =10000, (a) N=M-1, (b) 
N=M. From this table the following conclusions can be drawn: 

• The computational complexity of the SP-SDW-MWF (Algorithm 2) with filter 
wo is about twice the complexity of the QIC-GSC (and even less if the filter wo is not 
used). The approximation of the regularisation term in Algorithm 4 further reduces 
the computational complexity. However, this only remains true for a small number of 
input channels, since the approximation introduces a quadratic term 0(N 2 ) . 

• Due to the storage of data samples in the circular speech + noise buffer Bi, the 
memory usage of the SP-SDW-MWF (Algorithm 2) is quite high in comparison with 
the QIC-GSC (depending on the size of the data buffer L bufl of course). By using the 
approximation of the regularisation term in Algorithm 4, the memory usage can be 
reduced drastically, since now diagonal correlation matrices instead of data buffers 
need to be stored. Note however that also for the memory usage a quadratic term 
0(N 2 ) is present. 

Algorithm 

NLMS based SPA 

SG with LP 
(Algorithm 2) 
SG with correlation 
matrices 
(Algorithm 4) 



Computational complexity 
update formula step size 



Mops 



(14M-ll-^^) + 
(6M-2)log 2 2ZMAC 
+ l/LSq + l/LD 

(26N + 4-if-) + 
(6iV + 10)log 2 2ZMAC 

(l(W 2 +13Af-^Jv) + 
(6iV + 4)log 2 2ZMAC 



adaptation 

(2M + 2)MAC 
+ 1D 

(4N + 6)MAC 
+lD + lAbs 

(2iV + 4)MAC 
+!D + lAbs 



3.22 (a) , 4.27 (b) 
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NLMS based SPA 



Memory usage 

A{M-\)L + 6L 



kWords 



0.45 



SG with LP 



2NL Mi +6LN + 7L 



40.61 (a) , 60.80 ( 



(Algorithm 2) 



SG with correlation 



ALN 2 +6LN + 1L 



1.12 (a) , 1.95 (b) 



matrices 



(Algorithm 4) 



Table 2 



[0090] It is now shown that practically no performance difference exists between 
Algorithm 2 and Algorithm 4, such that the SP-SDW-MWF using the implementation with 
(diagonal) correlation matrices still preserves its robustness benefit over the GSC (and the 
QIC-GSC). The same set-up has been used as for the previous experiments. 

The performance of the stochastic gradient algorithms in the frequency-domain is evaluated 
for a filter length L=32 per channel, p'=0.8, y=0.95 and X=0.9998. For all considered 
algorithms, filter adaptation only takes place during noise only periods. To exclude the effect 
of the spatial pre-processor, the performance measures are calculated with respect to the 
output of the fixed beamformer. The sensitivity of the algorithms against errors in the 
assumed signal model is illustrated for microphone mismatch, i.e. a gain mismatch T 2 = 4dB 
at the second microphone. 

[0091] Fig. 15 and Fig. 16 depict the SNR improvement ASNR in , e iiig and the speech 
distortion SD inte iiig of the SP-SDW-MWF (with w 0 ) and the SDR-GSC (without w 0 ), 
implemented using Algorithm 2 (solid line) and Algorithm 4 (dashed line), as a function of 
the trade-off parameter l/ju. These figures also depict the effect of a gain mismatch Y 2 = 4 
dB at the second microphone. From these figures it can be observed that approximating the 
regularisation term in the frequency-domain only results in a small performance difference. 
For most scenarios the performance is even better (i.e. larger SNR improvement and smaller 
speech distortion) for Algorithm 4 than for Algorithm 2. 

[0092] Hence, also when implementing the SP-SDW-MWF using the proposed Algorithm 
4, it still preserves its robustness benefit over the GSC (and the QIC-GSC). E.g. it can be 
observed that the GSC (i.e. SDR-GSC with 1/^=0) will result in a large speech distortion 
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(and a smaller SNR improvement) when microphone mismatch occurs. Both the SDR-GSC 
and the SP-SDW-MWF add robustness to the GSC, i.e. the distortion decreases for increasing 
l/ju. The performance of the SP-SDW-MWF (with w 0 ) is again hardly affected by 
microphone mismatch. 
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